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Abstract 

An algorithm for sampling from non-log-concave multivariate distri¬ 
butions is proposed, which improves the adaptive rejection Metropolis 
sampling (ARMS) algorithm by incorporating the hit and run sampling. 

It is not rare that the ARMS is trapped away from some subspace with sig¬ 
nificant probability in the support of the multivariate distribution. While 
the ARMS updates samples only in the directions that are parallel to di¬ 
mensions, our proposed method, the hit and run ARMS (HARARMS), 
updates samples in arbitrary directions determined by the hit and run 
algorithm, which makes it almost not possible to be trapped in any iso¬ 
lated subspaces. The HARARMS performs the same as ARMS in a sin¬ 
gle dimension while more reliable in multidimensional spaces. Its perfor¬ 
mance is illustrated by a Bayesian free-knot spline regression example. 

We showed that it overcomes the well-known ‘lethargy’ property and de¬ 
cisively find the global optimal number and locations of the knots of the 
spline function. 

Keywords: Adaptive Rejection Metropolis Sampling; Hit-and-Run Algo¬ 
rithm; Regression Splines; Free-knot Splines, Empirical Bayesian Method. 
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1 Introduction 


Adaptive reject Metropolis sampling (ARMS) is a combination of adaptive re¬ 
jection sampling (ARS) [1] and a Metropolis-Hastings sampling [2]. ARS was 
proposed for sampling from univariate log-concave conditional distributions. As 
an extension of ARS, ARMS [5] removes the log-concavity restriction on the 
simulated probability density functions. When the density to be sampled from 
is multidimensional, a straight-forward approach is to embed ARMS within a 
Gibbs sampler, provided that all one-dimensional conditional densities can be 
simulated by ARMS. This type of approaches have been used widely in various 
areas since the first paper on ARMS [3]. For some more examples, m, [5],[6], 
[7], and [8] are all using ARMS in Gibbs sampler for multivariate distributions. 
However, if the probability density function is multimodal in a multidimensional 
space, Gibbs sampler can be easily trapped around some of the modes indef¬ 
initely, resulting in inefficient and even unreliable samples. In section 2, We 
showed that applying Gibbs sampler with ARMS algorithm to sample from a 
2-dimensional distribution is trapped around 3 modes, while the target distribu¬ 
tion has total 4 similar modes and is fairly smooth. Although, in practice, there 
are methods, such as using different proposal distributions, to prevent Gibbs or 
Metropolis-Hastings sampler being trapped in some subset of the support of the 
target distribution, whose probability is distinctly less than 1. These methods 
usually implemented by trial and error. They may still give impressions of con¬ 
vergence while some important aspects of the target distributions are missing. 
Hence, there is no guarantee from them that the samplers have sampled from 
all the subspaces around (almost) every local modes, especially sampling from 
multidimensional spaces and the locations of the modes are unknown priori. 

While Metropolis-Hastings algorithm may be stuck in only part of the sup¬ 
port of the distribution, applying ARMS to sampling from one-dimensional 
distribution does not generally have this problem, since it is an accept-reject 
method with an adaptive instrumental density function. In multidimensional 
situation, it is mainly due to the approach through Gibbs sampler that the 
ARMS can be trapped in some distinctly less than 1 probability subset of the 
support of the target distribution. Particularly, we believe that such being 
trapped disadvantage is due to that the Gibbs sampler updates the multidi¬ 
mensional samples in a fixed order and only in one dimension for each step. 
Note that Gibbs sampler does not necessarily sample in each dimension in ev¬ 
ery step, it could sample some dimension fewer times than the others and could 
sample in different order of the dimensions. Such alternative sampling scheme 
may relieve the being trapping issue at some level. However, such alternative 
schemes are not standard and need to be done by trial and fail. Therefore, 
again, this is not a very reliable way to solve the being trapped issue of ARMS 
in multi-dimensional distribution sampling. 

We propose an algorithm, which incorporate hit and run method and ARMS, 
instead of embedding ARMS in Gibbs sampler, to sample from general multi¬ 
variate distributions. Hit and Run algorithm was introduced by [5] and m 
Its property has been studied by m and [12]. Also, hit and run and Gibbs 
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samplers were compared by m- Although there’s no theoretical result, the em¬ 
pirical study showed that the hit and run method estimators have less bias and 
standard errors than the corresponding Gibbs sampler estimators. The authors 
suggested that this is due to the less autocorrelations of the hit and run method 
than the Gibbs sampler. Plus, since hit and run method generates uniformly 
distributed directions , so it suffers the problem of being stuck in only part of 
the support of the distribution much less likely. Due to the difficulty of generat¬ 
ing a signed distance for the hit and run method, griddy, acceptance/rejection, 
or Metropolis methods have been used and the resulting algorithms have been 
studied. These methods more or less scarifice efficiency. In this paper, ARMS 
algorithm is used for generate the signed distance. The resulting algorithm has 
not been proposed and studied explicitly in literature based on the authors’ 
best knowledge. Combining the reliability of hit and run algorithm and the 
efficiency of ARMS, we can avoid the unreliable and inefficient issue due to the 
Gibbs sampler. 

It is worth to mention that the hit and run is just one of the algorithms 
that deals with direction sampling. For example, adaptive direction sampling 
(ADS) was introduced in [14], where the authors mentioned the Gibbs sampler 
has much slower convergence rate in high dimensional situation compared to the 
ADS. Incorporating ADS and ARMS looks promising and worth to be studied 
and compared with the hit and run method with ARMS, but is out of the scope 
of this paper. 

As an example of the application of HARARMS, it was applied to solve a 
free knots regression spline problem. Spline function is widely used in the anal¬ 
ysis of two-dimensional data {yi,Xi). Early papers [H] [16] have addressed the 
importance of the role of splines in smoothing and regression analysis. Ruppert 
et al. A method of equally spaced knots was proposed in m- This knot place¬ 
ment method is simple and straightforward to implement. However, the nature 
of this method does not guarantee that knots are placed at all critical locations, 
at which the underlying regression function possesses sharp changes. The other 
type of methods is called curve-fitting with free-knot splines, where the number 
of knots and their locations are determined from the data. To discuss the full 
benefits of the spline approach, free-knot methods become an important and 
difficult problem. 

Traditional methods m for the optimal knot selection is to add knots in 
intervals where the residuals are inadmissible large. In stead of parameters, 
the positions of the knots can be taken as the selection of functional type in an 
ordinary curve fitting problem [18] . The knots should be chosen as to correspond 
to the overall performance of the data fitting. A new knot is assigned until the 
sum of squared residuals gets to a minimum. The recent literature propose 
several approaches to automatic knot selection that needs to search all possible 
models. Many of them are based on stepwise regression ideas. For example, 
[IS], [IS], ED- A stochastic search method was proposed in [22] for optimizing 
the knot locations by using a continuous genetic algorithm. Bayesian methods 
for fitting free-knot splines have been considered in some literatures [23], [24] . 
[25], and [26]. These approaches employ continuous random search methodology 
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through the use of Monte Carlo Markov chain algorithms although the objective 
is not minimization of Equation. A review and comparison of some of these 
approaches is given by m- 

Although most of the automatic knot selection procedures mentioned here 
have exhibited good performance, they all face a nonlinear problem with a 
lethargy property m- The ’’lethargy” property is intrinsic to free-knot spline 
problems. In the other word, there may be local minima, saddle points, or even 
local maxima for the optimal knot placement. The ’’lethargy” typically cause 
many problems for the derivative-based optimization methodology and could 
lead to a poor estimation. 

We show that the proposed HARARM algorithm has good properties for 
solving global optimization problems, in particular, finding the the numbers 
and locations of the free knot regression model efficiently and reliably. The 
overall procedure of the free-knot idea is investigated by simulated data sets. 

The proposed HARARMS algorithm is described in detail in Section 2, where 
a brief description, which has more details than above, to the related algorithms 
are also given. Then by comparing and discussing the performances of sampling 
in multidimensional space by ARMS with Gibbs sampler and Hit and Run 
algorithm in Section 3, we show that HAEARMS is significantly better in mul¬ 
tidimensional situation, especially there are multi-modes, which is the case in 
almost all real world problems. In the last section, the HARARMS is used to 
fit the data to the free knots regression spline, and the performance is assessed. 

2 Algorithms 

In the Bayesian context, the objectives of the modeling are usually posterior dis¬ 
tributions of the unknown parameters. High dimensional distribution samples 
can be generated from the Gibbs sampling [28] straightforwardly for Bayesian 
inference. At each iteration of the Gibbs sampling, the parameter is updated by 
sampling a new value from its full conditional distribution. The full conditional 
distribution of a parameter is its distribution conditional on the data and on 
the current values of all the other parameters. 

Eor the completeness, we introduce the adaptive rejection sampling(ARS) 
and the adaptive rejection Metropolis sampling(ARMS) first. We define x to be 
the random variable to be sampled. 

The ARS extends the rejection sampling [53] by adaptively adjusting the 
sampling distribution. Eor a basic rejection sampling method, it requires that 
a sampling distribution g(x) for a random variable x can be easily drawn and 
Mg{x) > /(x) given a finite constant M. The rejection sample method is very 
useful for sampling a full conditional distribution since the integration of /(x) 
is not needed. In practice, this algorithm involves an evaluation and potential 
rejection step after a sample is drawn from g{x). Eor most of the time, it is 
a challenging task to pick up an appropriate sample distribution g{x) and the 
constant M in order to reduce the ratio of the rejection. 

The ARS [1] is proposed to use a sequence of sampling distributions gm(x) 
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defined by piecewise linear functions hmi^)- 


hm{x) = min[L;_i_i(x; Sm), Lz+l,i+2(x; Sm)], X; < X < Xi+i, (1) 


where Sm = {xpZ = O,--- ,m + 1} is a set of points in the support of /. 
Lij(x; Sm) denotes the straight line through points [x/, ln/(x/)] and [xj, ln/(xj)]. 
Defining Mm = / exp(/im(x))(ix, The sampling distribution is given by: 


5m (x) 


exp(hm(x)) 

Mm 


Now the algorithm of ARS is derived as the following: 


( 2 ) 


0. Initialize m and Sm- 

1. Generate x ^ gm{x), U ^ U[op] 

> exp/i!l\x)} ’ update Sm to = 5^ U {x}; otherwise, accept x. 

For univariate cases, the average of iterations of the ARS to accept one point 
depends on the initial Sm and the target distribution /. The biggest limitation 
of ARS is to require / to be log-concave since /im(x) needs to be an envelope 
of ln/(x). The ARMS [3] was proposed to deal with non-log-concave densities. 
The ARMS incorporates a Metropolis-Hastings algorithm step to ARS. The 
Metropolis-The Hastings algorithm is briefly covered as the following: 

0. Initialize Xs 

1. Generate X ~ g(x|xs),^ U[o 4 ]. 

2. If [/ > min{l, x^+i = x*; otherwise accept x. 

The Metropolis-Hastings algorithm(MH) [5] improved the Metropolis algorithm 
[30] by generating samples from a proposal distribution q{x\x'). The perfor¬ 
mance of the MH algorithm depends on the quality of the proposal distribution. 
Similar to other MGMG algorithms, samples from the MH algorithm may stay 
in a local maximum of / due to an unsuitable proposal distribution. 

To carry out the ARMS algorithm [3], let Sm = {x;; I — 0, - ■ ■ , m-|-l} denote 
a current set of points in ascending order, For I < Z < m, let Lij(x-, Sm) de¬ 
note the straight line through points [x/,ln/(x/)] and [xj,ln/(xj)]. A piecewise 
linear function hii+i{x) = max[Li_/+i(x), min{L/_i,/(x), L/+i_;+ 2 (x)}], where 
X e (x/,xi+i). hm{x) = {h/,;+i(x); I = I,-- - ,m- 1}. Let /io,i(x) = Lo,i(x) 
and hm,m+ii^) = Lm,m+i{^) define the first and the last piecewise linear func¬ 
tions, respectively. The sampling distribution is then gm{x.) = exp{hm(x)}/Mr„, 
where Mm = / exp{/im(x)}(ix. We further define y^cur and x„ei« as the current 
value and new sample from /(x). ARMS starts from ho,i(x) = Lo.i(x) and 
construct hi_i+i(x); we include a new point and relabel points; we then recon¬ 
struct (x). For multivariate distribution, we reserve the bold font x for the 
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objective random variable vector. Let k denote the dimension of the variable 
and k = 1, - ■ ■ ,K. Assuming /(x(fe) |x(_fe)) is a target univariate distribution to 
be selected, where X{-k) is the parameters except X(j.). For ease of notation we 
write /(x) below instead of f{x(^k)\^(-k))- The algorithm of ARMS embedded 
in Gibbs sampler for multivariate distribution is described as the following: 


For fc = 1 to AT, let /(x) denote f {x(^k)\^(-k)) in the following steps. Do 
1. Simulate x^i from gm{^) and U ~ U[o,i] until 


U < 


/(xa) 

exp{h„(xA)}' 


2. Generate U ^ U[o,i] and take 


Hg) = 


XA 

^cur 


U < min 


2^ /(xa) min[/(Xe„r),exp(fe„t(Xeur))] 
’ /(Xcur).min[/(XA) exp(/lm(xA))l 


otherwise. 


The Hit-and-Run algorithm can be thought of as random-direction Gibbs: 
in each step of the hit-and-run algorithm, instead of updating x along one of 
the dimensions, we update it along a randomly generated direction that is not 
necessarily parallel to any dimension. More precisely, the sampler is defined in 
two steps: first, choose a direction d from some positive density on the unit 
sphere d!d = 1. Then, similar to Gibbs, sample the new point Xnew along 
the line specified by d and the distance by z, where z is from the marginal 
one-dimensional density on the line that specified by d. The ARMS with Hit- 
and-Run (HARARMS) algorithm is as the following: 

0. Initiate x^, generate dg = Ug/\/uf + ■ ■ ■ + Uq, Ug ~ U[o i],and set f*{z) = 
f{xs + d'^z), initiate M and Sm for 

1. Simulate za from gm{z) and U ~ U[oa] until 


U < 


fjzA) 

exp{hmizA)}' 


2. Generate U ^ U[oa] and take 


I TT ^ J 1 /(za) min[/(zc„r).exp(/t„(zcur-))] i 

Xs-i+d ZA t/<mm<i, exp(;i„(zA))J G 

Xs_i -I- cfzcur otherwise. 
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3 Comparing sampling performances 

The ARMS with Hit-and-Run reduces a multiple dimensional question into one 
dimensional ARMS sampling. It breaks the procedure into two steps: (1). pick 
up a random direction d; (2). implement a one dimensional ARMS for the 
random variable z. An important advantage of ARMS with Hit-and-Run over 
regular multiple dimensional ARMS is that it is much more likely to reach the 
isolated local areas by evaluating one dimensional ARMS in a random direction 
searching, while regular multiple dimensional ARMS only searches the space in 
the direction that is parallel to one of dimensions in each updating step. For 
example, in a two dimensional case, the Gibbs sampler sampling the new points 
either in the vertical or the horizontal direction from the current point. The 
following example shows that even for sampling from a mixture of 4 bivariate 
normal distribution with similar mixing probability mass and variances, the 
Gibbs sampler with ARMS can be trapped around only 3 of the 4 modes and 
totally missing the forth one. This is mainly due to that the forth mode is 
hardly reached by searching in the 2 dimensional space along the directions 
only parallel to the axes in each step. 

Assuming that we have a two dimensional random variable, x = {a;(i), X( 2 )}'^, 
and the the objective sampling distribution is /(x) as below: 

4 

/w = 

2=1 


In this example, we set p,i = (5,-5)'^, /i 2 = (5,5)"'", fis = (—5,5)"'", and /i 4 = 
(13,13)"'". S = Dia(;(0.5,0; 0,0.5), pi = 0.2, p 2 = 0.3, P 3 = 0.2 and p 4 = 
1-Pi-P2 - Pa- 

Sampling X from /(x) is an easy task if we know that /(x) is a mixture of 
normal distributions. We can pick one sampling distribution among the four 
in terms of their probabilities pi, and then sample x from the selected normal 
distribution. Gonsider /(x) as if it did not have the explicit and easy form of 
mixture to sampling form, and treat it as a general distribution, which is the 
case as sampling from general complex functions. Then we use both the Gibbs 
sampler with ARMS embedded and the HARARMS to sample x from the /(x), 
and compare their performances. After s=10000 iteration, we generate the 
sampling of x from both Gibbs sampler with ARMS embedded and HARARMS. 
To make a reference, contour plots are made by the grid search on a:(i) and X(^ 2 )i 
which refer as xl and x2 in the Figure [T] 

The comparison of the performances of the two algorithms is also summa¬ 
rized by the following table: 
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Log-lik 



0 


-100 


-200 


-300 


400 


-500 


0 


-100 


-200 


-300 


-400 


-500 


lund only 3 of the 4 
)und every modes. 









Table 1: Comparing ARMS performances 


Parameter 

Actual 

HARARMS 

ARMS with Gibbs 


Ml 

M2 

P 

Ml 

M2 

P 

Ml 

M2 

P 

(Comp 1) 

5 

-5 

0.20 

5.00 

-4.98 

.20 

4.99 

-5.03 

.29 

(Comp 2) 

5 

5 

0.30 

4.99 

5.02 

.30 

4.99 

5.02 

.42 

(Comp 3) 

-5 

5 

0.30 

-4.97 

4.99 

.22 

-4.98 

4.99 

.29 

(Comp 4) 

13 

13 

0.20 

12.98 

13.02 

.28 





The HARARMS estimates all the parameters well, while the Gibbs sampler misses 
one of the modes and gives severely biased estimation on the mixing probability. 


4 Free Knots Regression splines 


Now we apply the HARARMS to a free knot spline model. Assuming that we 
would like to model a dependent variable, yi, using one independent variable, 
Xi, where i = 1, • • • , n. Starting with the straight line regression model: 


2/i = /3o + l3iXi + Ci, 


where ^ N(0,cr^). Denote x = (xi,-- - ,Xn)^ and y = {yi,-'' In the 

framework of spline regression, the basis functions for fitting such simple linear 
regression are dehned as: 

1 Xi 


X = 


To handle nonlinear structure, regression models consisting of several differently 
sloped lines are developed. One method is to introduce a basis function that is 
zero to the left of k and then is a positive value from k onward. A mathematical 
way of expressing this truncated basis function is 


(x, - k )+, 


where, for any number u, u+ is equal to u if u is positive and is equal to 
0 otherwise. Therefore, the regression model with multiple truncated basis 
functions is 

K 

Vi = /5o + X! i = 1,* • • ,n. (4) 

k^l 

Now the corresponding basis function matrix is: 


X = 


1 Xi (Xi - Ki) + 
1 X„ {Xn - Kl) + 


(Xi - Kk) + 
(x„ - Kk)+ 


In (U) the value of k corresponding to the function (x^ — k)+ usually refers to 


9 





















a knot. A set of such functions is called a linear spline basis. Equation (|3]) is 
called a spline model with a linear spline basis. Assume that the parameters 
/3o, /3i, as well as b = (6i, • • • , 6^, • • • , Bk) are unknown while k’s in matrix X 
is known, the model can be easily solved by the multiple linear regression. 

However, if the number and locations of the knots, the k’s, are unknown, 
the model will be not so easy to handle, and the choice of them is a critical 
problems. Wold [18] describes the choice of knot positions as the selection of 
functional type. The knots should be chosen in terms of the overall performance 
of the data fitting. Simplified methods m, [H]) works well when the number 
of knots is large (about 30-40). The extra caution is necessary in using the 
large number of knots, since the great flexibility of splines can make overfitting 
the data. Ideally we should put as few knots as possible. The knots should 
be placed at all critical locations, at which the underlying regression function 
possesses sharp changes. 

The difficulty in detecting the optimal number and location of knots comes 
from the ’’‘lethargy’” property (|3I]) intrinsic to free-knot splines as discussed 
in Section 1. Such local maxima problem for free-knot splines method is rarely 
discussed, and it actually is the big barrier for designing an attractive free-knot 
algorithm. We show the local maxima problem for model (|3|) in detail as the 
following: 

To find the optimal number and locations of the knots, we define the loss 
function by the sum of squared errors, just the same as the classical method, 
which minimizes the loss function and finds the optimal. Let B = (/3o,/3i,b)"'". 
Based on ([4|) and assumption that the error is normally distributed, the full 
likelihood is: 

p{y,\K,B,a'^) = —^exp[-icr2(yi- x,(/c)B)2] (5) 

V27rCT 2 

In ([S]), both of knot locations k and the coefficients B and tr^ need to be opti¬ 
mized. Given the number and locations of knots, the optimization in terms of 
the coefficients, B and can be obtained by standard linear least-squares so¬ 
lution. In the other word, we have the residual sum of squares for a given knots’ 
locations by substituting B and with the ordinary least squares solution B 
and an unbiased estimation a^. Consider an empirical Bayesian method: 

p(2/*|k) oc J_ . exp[-^g2(-y^ _ Xi{K)B f] (6) 

v27ra^ ^ 

where B is an empirical solution, i.e. B = {X'X)~^X'y. cr^ = XiB)/(n — 

q—1), where q is the number of unknown coefficients of (|4|). It is now well known 
that the likelihood function corresponding to (|6|) tends to have multiple maxima 
and exhibits a lethargy property near extremes that can lead to premature 
convergence at nonoptimal knot locations. 

To show the multiple maxima of the likelihood function, we give a noninfor- 
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mative prior to k. Then the posterior distribution of k is: 

n 

p{K\y) (xY[p{y^\K■)■ (7) 

i^l 

Therefore, the Log-likelihood function for k is 

n 

l{i^:y) = '^^og{p{y^\K)). ( 8 ) 

i=l 

Now we generate example data sets, and show the existence of the local 
maxima in terms of the location of the knots. Generate a simulated Dataset 
1, {xi,yi), from (|T]) with the parameters values as summarized in the following 
Tabled! and the generated data is plotted in Figurfd) 

Table 2: Simulated Dataset 1 


Type 

Value 

Knot location, k 

k=(200, 300, 400, 500, 700, 900) 

Coefficients, B 

Po = -0.5, Pi = -0.5, and b = (0.5,1.0, -2.0,2.5, -3.0,3.5) 

Error, 

ei = iV(0,302) 

Independent variable, x 

a:= (1,2,--- ,1000)^ 

Dependent variable, y 

Sampled based on (Hj) 


Similar to the setting of dataset 1, let k equal to (700) or (700,500), and 
then plot the curve of the likelihood functions ([8]) versus the location of k in 
Figure |3I correspondingly. Using one and two dimensional k is to make the 
graph easy to read, without loss of generality. The multiple maxima are clear 
in the graph. Specifically, for iF = 1, the largest value of the Log-likelihood 
(-5630.89) occurs with knot location at 730; a local maxima occurs at 160, and 
the Log-likelihood is -5698.99. For K = 2, Let both of Ki and K 2 have values 
from 100 to 990 with increments of 10. The value of the Log-likelihood is then 
plotted for each combination of the two knot locations on the right side of Figure 
[3| You can see that the largest value of the Log-likelihood (-5339.10) occurs 
with knot location at a splines function with ki = 670 and K 2 = 520; A local 
maxima Log-likelihood (-5406.38) appears at a splines function with ki = 700 
and K 2 = 270; The other local maxima Log-likelihood (-5535.50) appears at the 
combination of ki = 90 and K 2 = 770. Through this visualization, It is clear 
that three local Log-likelihood maximums exist within the splines function with 
two knots. This indicates that the log-likelihood function for the general free 
knot spline with unknown number of knots should be multimodal in the high 
dimensional space, we can see that the location of a global maximum can 
therefore be assured only through a global grid search, which unfortunately 
becomes computationally infeasible quickly as the dimension increases. As we 
discussed above, for sampling from such target functions, the HARARMS will 
be better than Gibbs sampler. 
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X 

Figure 2: y is generated from o with defined knot locations and parameters 
values, a; = (1, 2, • • • , 1000)"'’ 


Table[3]shows another setting for generating a quadratic spline function with 
5 knots. The data generated from the this setting is named as dataset 2. 


Table 3: Simulated Dataset 2 - for Quadratic Spline base 


Type 

Value 

Knot location, k 

k=(0.2, 0.4, 0.5, 0.7, 0.9) 

Coefficients, B 

/3o = -0.5, j3i = 0.5, 132 = -0.5, and b = (1, -3,5, -7,15) 

Error, 

e, = V(0,0.32) 

Independent variable, x 

x={l,2,--- ,1000)'^ 

Dependent variable, y 

Sampled based on (jH) 


The following 2 tables and 2 figures are the result of using HARARMS to find 
fit the free knot spline model to dataset 1 and dataset 2 correspondingly. Since 
the number of the knots in the true model is unknown, we fitted the models 
with 1, 2,..., 10 knots correspondingly. As the number of the knots increases, 
the maximum of the likelihood increases, even when the number is greater than 
the true number. However, by comparing the increasing among the different 
numbers of knots in the model, we can easily identify the true model. When 
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101 201 301 401 501 601 701 801 901 


Location of Knot 1 


Figure 3: Top: Gridding the knot location with one knot splines, plot the log 
likelihood of (jS]) to demonstrate the multiple local maximum problem. Bottom: 
Gridding the knot location with two knot splines, plot the log likelihood of ([HI) 
to demonstrate the multiple local maximum problem. 


the true number is reached, adding more knots will only increase the maximum 
of the log-likelihood function marginally. Of course, based on this finding, using 
AIG or BIG as creteria can decisively and correctly get the model selection done. 
The estimation of the location of the knots when the number of the knots in 
the model is selected correctly, is quite accurate, as we can see in the following 
tables. 
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Table 4: Linear regression splines 


Case 

Ki 

K2 

K3 

K4 

K5 

Kq 

K’j 

KS 

Kg 


Log-likelihood 

Actual 

200 

300 

400 

500 

700 

900 






1-knot 

727 










-5631.07 

5% 

713 










-5633.01 

95% 

740 










-5630.87 

2-knot 

518 

671 









-5339.78 

5% 

508 

661 









-5342.61 

95% 

530 

679 









-5339.11 

3-knot 

519 

698 

899 








-5040.63 

5% 

511 

694 

894 








-5043.65 

95% 

526 

702 

904 








-5039.79 

4-knot 

185 

540 

698 

899 







-4925.23 

5% 

172 

532 

694 

894 







-4928.52 

95% 

198 

548 

701 

903 







-4923.74 

5-knot 

245 

404 

503 

697 

899 






-4802.55 

5% 

232 

398 

498 

694 

896 






-4805.52 

95% 

258 

409 

507 

700 

903 






-4800.69 

6-knot 

206 

286 

400 

503 

698 

899 





-4796.77 

5% 

168 

255 

394 

496 

695 

896 





-4800.75 

95% 

223 

309 

407 

507 

700 

903 





-4794.36 

7-knot 

211 

287 

400 

503 

697 

899 

969 




-4796.57 

5% 

188 

264 

394 

498 

694 

884 

899 




-4799.86 

95% 

234 

314 

406 

507 

700 

903 

987 




-4794.46 

8-knot 

207 

227 

280 

359 

397 

503 

697 

899 



-4796.20 

5% 

155 

185 

230 

313 

383 

498 

694 

895 



-4799.98 

95% 

224 

241 

309 

398 

410 

508 

700 

903 



-4794.09 

9-knot 

30 

213 

272 

312 

396 

462 

504 

698 

899 


-4795.85 

5% 

13 

118 

210 

280 

376 

411 

498 

694 

896 


-4799.03 

95% 

86 

231 

311 

368 

405 

507 

546 

701 

903 


-4793.67 

10-knot 

27 

179 

218 

277 

313 

397 

474 

505 

697 

899 

-4795.03 

5% 

12 

147 

182 

232 

271 

370 

393 

499 

695 

895 

-4797.91 

95% 

61 

222 

279 

305 

365 

407 

508 

550 

700 

903 

-4793.02 


The generated data is following a linear spline function with 6 knots at 200, 300, 400, 
500, 700, and 900. The n-th big row shows the estimation for the location of the knots, 
corresponding to the fitted model has n knots. Also, the 90% credible intervals to these 
estimation are reported. In the last column, the value of the log-likelihood function 
corresponding to the estimated values and the boundaries of the intervals are listed. 
Not surprisingly, the log-likelihood function value of the estimations increases as the 
number of the knots increases. However, when the number of the knots in the model 
reaches the truth, the increasing in log-likelihood could be neglected, compared to the 
increasing before the number of knots in the model reaches the true number. 
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Table 5: Quadratic regression splines 


Case 

Ki 

K2 

K3 


K5 

Kq 

K7 

Ks 

Kg 

KlO 

Log-likelihood 

Actual 

0.2 

0.4 

0.5 

0.7 

0.9 







1-knot 

0.77 










-1853.74 

5% 

0.77 










-1854.35 

95% 

0.77 










-1853.35 

2-knot 

0.59 

0.64 









-1616.39 

5% 

0.57 

0.62 









-1618.44 

95% 

0.61 

0.65 









-1615.81 

3-knot 

0.58 

0.69 

0.90 








-1276.31 

5% 

0.57 

0.68 

0.90 








-1279.13 

95% 

0.59 

0.70 

0.91 








-1275.32 

4-knot 

0.45 

0.48 

0.70 

0.90 







-1205.32 

5% 

0.43 

0.47 

0.69 

0.90 







-1207.94 

95% 

0.47 

0.50 

0.71 

0.91 







-1203.96 

5-knot 

0.19 

0.40 

0.49 

0.70 

0.90 






-1184.38 

5% 

0.15 

0.36 

0.47 

0.69 

0.90 






-1187.97 

95% 

0.24 

0.44 

0.51 

0.71 

0.91 






-1182.81 

6-knot 

0.19 

0.41 

0.49 

0.69 

0.72 

0.90 





-1184.09 

5% 

0.14 

0.38 

0.47 

0.61 

0.69 

0.90 





-1187.57 

95% 

0.23 

0.44 

0.51 

0.71 

0.79 

0.91 





-1182.59 

7-knot 

0.20 

0.39 

0.47 

0.51 

0.70 

0.86 

0.90 




-1183.19 

5% 

0.15 

0.27 

0.41 

0.47 

0.67 

0.69 

0.89 




-1186.42 

95% 

0.26 

0.44 

0.51 

0.70 

0.75 

0.91 

0.96 




-1181.28 

8-knot 

0.10 

0.20 

0.41 

0.48 

0.55 

0.68 

0.74 

0.90 



-1182.46 

5% 

0.01 

0.02 

0.15 

0.39 

0.48 

0.51 

0.68 

0.89 



-1185.40 

95% 

0.25 

0.40 

0.47 

0.51 

0.71 

0.74 

0.91 

0.96 



-1180.74 

9-knot 

0.19 

0.31 

0.43 

0.48 

0.60 

0.68 

0.72 

0.87 

0.91 


-1182.08 

5% 

0.04 

0.14 

0.29 

0.39 

0.47 

0.52 

0.64 

0.69 

0.89 


-1185.53 

95% 

0.26 

0.43 

0.51 

0.60 

0.69 

0.72 

0.90 

0.92 

0.98 


-1180.18 

10-knot 

0.06 

0.18 

0.28 

0.40 

0.47 

0.52 

0.64 

0.69 

0.77 

0.90 

-1181.09 

5% 

0.02 

0.06 

0.15 

0.29 

0.36 

0.44 

0.48 

0.63 

0.68 

0.89 

-1184.02 

95% 

0.24 

0.33 

0.42 

0.50 

0.54 

0.66 

0.71 

0.78 

0.91 

0.94 

-1179.21 


The generated data is following a quadratic spline function with 6 knots at 0.2, 0.4, 0.5, 0.7 and 
0.9. The n-th big row shows the estimation for the location of the knots, corresponding to the 
htted model has n knots. Also, the 90% credible intervals to these estimation are reported. In 
the last column, the value of the log-likelihood function corresponding to the estimated values 
and the boundaries of the intervals are listed. Not surprisingly, the log-likelihood function value 
of the estimations increases as the number of the knots increases. However, when the number 
of the knots in the model reaches the truth, the increasing in log-likelihood could be neglected, 
compared to the increasing before the number of knots in the model reaches the true number. 
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